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Abstract. We propose a new general approach for estimating the effect of a bi¬ 
nary treatment on a continuous and potentially highly skewed response variable, 
the generalized quantile treatment effect (GQTE). The GQTE is defined as the 
difference between a function of the quantiles under the two treatment conditions. 
As such, it represents a generalization over the standard approaches typically used 
for estimating a treatment effect (i.e., the average treatment effect and the quantile 
treatment effect) because it allows the comparison of any arbitrary characteristic 
of the outcome’s distribution under the two treatments. Following Dominici et al. 
(2005), we assume that a pre-specified transformation of the two quantiles is mod¬ 
eled as a smooth function of the percentiles. This assumption allows us to link the 
two quantile functions and thus to borrow information from one distribution to 
the other. The main theoretical contribution we provide is the analytical deriva¬ 
tion of a closed form expression for the likelihood of the model. Exploiting this 
result we propose a novel Bayesian inferential methodology for the GQTE. We 
show some finite sample properties of our approach through a simulation study 
which confirms that in some cases it performs better than other nonparametric 
methods. As an illustration we finally apply our methodology to the 1987 Na¬ 
tional Medicare Expenditure Survey data to estimate the difference in the single 
hospitalization medical cost distributions between cases (i.e., subjects affected by 
smoking attributable diseases) and controls. 

Keywords: average treatment effect (ATE), medical expenditures, National 
Medical Expenditures Survey (NMES), Q-Q plot, quantile function, quantile 
treatment effect (QTE), tailweight. 


1 Introduction 

The effect of a treatment on an outcome is often the main parameter of interest in many 
scientific fields. The standard approach used to estimate it is the so called average 
treatment effect (ATE), the difference between the expected values of the response’s 
distributions under the two treatment regimes. While intuitive and useful in many 
situations, it suffers from some limitations; in particular, it becomes highly biased when 
the response is skewed. 
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A further drawback of the ATE is its coarseness as a summary of the distance 
between the expected value of the response’s distributions under the two treatments. It 
is a matter of fact indeed that the effect of the treatment on the outcome often varies as 
we move from the lower to the upper tail of the outcome’s distribution. This limitation 
of the ATE has been addressed in the literature by introducing the so called quantile 
treatment effect (QTE), the difference between the response’s distribution quantiles 
under the two treatments (Abadie et ah, 2002; Chernozhukov and Hansen, 2005; Firpo, 
2007; Frolich and Melly, 2008). 

In this paper we propose a more general measure of the effect of a binary treatment 
on a continuous outcome. We call it the generalized quantile treatment effect (GQTE), 
defined as 

\ip) = giQiip)) - giQ2ip)), (i) 

where Qi(j>) and <52(p) represent the quantile functions of the outcome under the two 
treatment conditions and g{-) is an arbitrary but known function of the quantiles. For 
example, if g{-) is chosen to be the identity function, then the GQTE simplifies to the 
QTE, while if g{-) is the integral over the percentile p, the GQTE becomes equivalent 
to the ATE. The GQTE is a new parameter which generalizes the existing approaches 
for estimating a treatment effect. 

To estimate and formulate inferences about the GQTE we propose a Bayesian ap¬ 
proach that can accommodate both symmetric and skewed outcomes, as well as situa¬ 
tions where the sample size under a treatment condition (cases) is much smaller than 
the sample size under the other treatment condition (controls). In particular, we assume 

'■(Hi)"''”'' 

where h is a monotone function and s is assumed to be smooth. In other words, we 
assume that the transformed quantile ratio is a smooth function of the percentile p. 
The idea of smoothly modeling the ratio of the quantiles has been first introduced by 
Dominici et al. (2005), who exploited it by proposing a nonparametric estimator of 
the mean difference between two populations. Here we generalize their approach by 
permitting the comparison of any characteristic of the outcome’s distributions under 
the two treatments. 

An important theoretical contribution of this paper is the derivation of a closed form 
expression for the model likelihood. We show that it is possible to obtain an analytically 
tractable form for the I 2 density (the controls) without explicitly specifying a model for 
it. Glearly the likelihood is needed to carry out the Bayesian estimation but in principle 
it could be employed for classic likelihood procedures as well. Moreover, our proposed 
approach allows one to borrow strength from one sample to the other, thus improving 
efficiency in the estimation of the quantiles (Dominici et ah, 2005). 

As an illustration, we apply our method to the comparison of the single hospitaliza¬ 
tion medical costs distribution between subjects with and without smoking attributable 
diseases. The data set we use is the National Medical Expenditures Survey (NMES) 
supplemented by the Adult Self-Administered Questionnaire Household Survey. 
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The paper is organized as follows. In Section 2 we define the new parameter Ag(j)) 
and illustrate some quantile-based measures that will be used in the paper. In Section 3 
we provide details of the estimation approach together with some special cases. We 
then present the results of a simulation study in Section 4 through which we conclude 
that under a broad set of conditions our approach performs better than other flexible 
methods for comparing two distributions. In Section 5 we illustrate the results of the 
data analysis on the NMES data set. Section 6 concludes the paper with a discussion 
and some final remarks. 


2 The Generalized Quantile Treatment Effect (GQTE) 

Consider two positive continuous random variables Yi and Y 2 with quantile functions 
Qi and Q 2 , where 

Qiip) = = inf{2/ : Fi{y) > p} 

for 0 < p < 1 and i = 1,2. To compare Fi and F 2 as flexibly as possible we introduce 
the generalized quantile treatment effect, which is defined as 

^gip) = giQiip)) - 9{Q2{p)), (3) 

where g{-) is a known function of the quantiles. Notice that no a priori assumptions are 
made about the admissible functions thus potentially any function of the quan¬ 
tiles can be used. Therefore, the GQTE provides a general approach to compare the 
response’s distributions under the two treatments. More precisely, by properly choosing 
the function g{-), we can recover any specific characteristic of the outcome’s distributions 
Fi and F 2 and, through (3), their difference. 

The simplest case arises when g{x) = x. In this case the GQTE simplifies to 

A(p) = (5i(p) - Q2 (p), (4) 

the so called (unconditional) QTE (Frolich and Melly, 2008), sometimes also named 
the percentile-specific effect between two populations (see for example Dominici et ah, 
2006, 2007). 

A second example is obtained by choosing g(x) = J x dp, which produces 

Qi{p)dp- [ Q2{p)dp, (5) 

Jo 

the extensively used ATE (see for example Wooldridge, 2010, Ghapter 21). 

These examples illustrate how the GQTE reduces to the two most used parameters 
of interest for estimating a treatment effect, the ATE and QTE. However, the GQTE 
can provide a variety of other useful measures. In Appendix 1 we illustrate some other 
interesting cases that usually are not taken into consideration in the literature. 
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3 Estimation Methodology 

In this section we illustrate the procedure we developed for estimating the GQTE. Our 
proposed approach is sufficiently general that it can be used for any choice of g{-). 

3.1 Definitions and Model Assumptions 

We assume that Yi\r] ~ Ei(- ;? 7 ), where Fi is a given probability distribution depending 
upon a vector of unknown parameters rj. For example, in the application presented 
in Section 5 we choose Fi as a mixture distribution. To borrow information from one 
distribution to the other, we assume that the transformed quantile ratio is a smooth 
function of the percentiles with A degrees of freedom, that is 

The function h{-) is assumed to be monotone differentiable. It represents a kind of 
link function and it is used to transform the quantile ratio to account for the potential 
skewness of the Fi and ^ 2 - The typical choice for skewed data is h{x) = logic, while for 
symmetric data distributions the identity function is the most reasonable option. 

For the sake of simplicity, we henceforth indicate the smooth function s(p. A) with 
reference to the corresponding design matrix X{p,X), so that it can be written as 
X(j), A)/3, where /3 is a vector of unknown parameters. More explicitly, we assume that 
s{p, A) = X{p, A)/3 = X]fe=o ^k{p)/3k, where Xk{p) are orthonormal basis functions with 
Xq{p) = 1. The number of degrees of freedom A is a further parameter that has either to 
be chosen or estimated from the data. In Subsection 3.7 we propose a simple approach 
for eliciting it. The basis functions are usually either splines or polynomials. 

The main justification for assuming (6) is that it allows one to borrow information 
from both the response’s distributions under the two treatment conditions when we 
estimate the GQTF Ag(p). Assumption (6), in fact, implies 

Qi{p) = Q2{p)h-^[X{p,\)(T\, (7) 

and also 

Q2iT)=QiiT){h-^[X{p,\)(3\]~\ ( 8 ) 

which, once substituted in (3), return 

^gip) = 9 (Q 2 {p)h~^ [X(p,A)/3]) - g(Qi{p) {h~^ [X{p,X)f3]} 

For the special case where g{x) = x and h{x) = log(a;), Dominici et al. (2005) have 
shown that under assumption (6) it is possible to obtain a more efficient estimator of A 
than the sample mean difference and the maximum likelihood estimator assuming that 
Yi and y 2 are both log-normal. 

Notice that, since the main interest in the paper resides in the estimation of Ag(p), 
for which only f3 is required, rj is treated as nuisance (see Subsection 3.6 for further 
details). 
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In this paper we propose a Bayesian approach for estimating Ag (p) for any choice of 
g{-) and h{-). An interesting feature of our estimation procedure for Ag(p), is that we 
only need to specify the distribution function for Yi. The specification of Fi together 
with the relationship (6) automatically determines a distributional assumption for Y 2 . 
We refer to the distribution of I 2 induced by Fi and assumption (6) as F 2 (-; /3, r?)- 

As a last remark for this section, we want to highlight the difference between the 
function g(-), introduced in the previous section, and h(-), defined above in (6). They 
should not be confused because they have distinct roles: the former identifies the re¬ 
sponse’s characteristic we want to estimate for assessing the treatment effect, while the 
latter has been introduced as a mechanism to attenuate the possible skewness present 
in the data. 

3.2 Estimation Approach and Likelihood 

The steps involved in our estimation approach are summarized as follows: 

1. Choose a (possibly flexible) density /i(2/i|i7) for Yi, a smoothing function s(p, A) 
(usually a spline or a polynomial) and a value for A; 

2. From (8) derive the density function of Y 2 , that we denote as f2(y2l/3,'n)- Note 
that, as proved by Theorem 1 below, this density will depend on the model pa¬ 
rameter /3 as well as on the parameter rj through the Yi density. 

3. Calculate the joint likelihood L (/3, 77 !^^, 2 / 2 ) to use for finding the posterior dis¬ 
tribution of {f3, rj) in a Markov Chain Monte Carlo (MCMC) algorithm. 

4. Obtain the posterior distribution of any special case of the GQTE. 

The critical step in this sequence is represented by the calculation of the likelihood, 
which we now describe. 

Consider two i.i.d. samples (j/n,..., yim) and (j /211 ■ • ■ j 2 / 2712 ) drawn independently 
from the two populations Fi(-; 77) and F 2 {- We refer to the former as the cases (or 

the treated) and to the latter as the controls (or the untreated). We assume that these 
distribution functions have densities fi{ - ',t]) and / 2 (-; /3, 77) respectively. The likelihood 
function for our model is then given by 

ni 712 

L (/3, v\yii, ■■■, 2 /ini, 2 / 21 , ■ ■ •, 2 / 2 » 2 ) = ^ 11 / 2 ( 2 / 221 / 3 ,^ 7 ). (9) 

i=i i=i 

Since we didn’t state any specific distributional assumption for Y 2 , in principle we 
could not calculate the likelihood because we don’t have any expression for / 2 . Two 
strategies are possible here. Given the fi specification, one possibility is to find an 
expression for Qi, then map it through equation (6) to find a corresponding expression 
for Q 2 , invert it to determine F 2 , and finally differentiate the result to get / 2 . Apart 
from simple situations, usually these steps (i.e. integration, inversion and differentiation) 
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need to be performed numerically. A second possibility is to replace the Y 2 density in the 
likelihood with its correspondent density quantile function^ f 2 {Q 2 {Pj)\f^-, v) (see Parzen, 
1979), for which the next theorem provides a closed form expression. The proof of the 
theorem and two additional corollaries are available in Appendix 2, while in the next 
subsection we provide some further explanation on how to compute / 2 . 

Theorem 1. LetYi\ri ~ Fi(- ;r]), with Fi having density function /i(- ;rj), and assume 
that (6) holds. If, for every 0 < p < 1, the vector /3 satisfies the constraint 


X'ip,X)f3 


-h-^[X{p,X)f3]\ < 


d{X{p,X)l3) 

the density quantile function /2((52(p)|/3,'^) for Y 2 is 


fi {Qi{p)\v) Qi{p) ’ 


( 10 ) 


f(0()\Bn) = _ h{Q 2 ip) h-^[X{p,\) l3]\r,)h-^[Xip,\) 13] _ 

J2W2[P)\P,V) l-/i(Q2(p)/«-M^(P,A)/3]l^)^'(pA)/3Q2(p){3pri^yj^/*-M^(P.A)/3]| ' ^ 


The function f2{Q2{p)\f3,'n) is a properly defined density. 


Note that /2 correctly depends upon both the model parameter /3 and the Yi param¬ 
eter r] through the fi density. The motivation for the constraint (10) comes from the 
need to guarantee that /2 is a non-negative function. As a further remark, we observe 
that the term in the likelihood involving /2((32(Pj)|/3, i?) depends upon the observations 
y 2 j through the unknown quantile function values Q 2 {pj). 


3.3 Details for the Computation of /2 

A computational drawback of our proposal is that the “true” values of the percentiles 
Pj, i.e. those generated under the assumed model for I 2 , should be used in the calcu¬ 
lation of the likelihood. Unfortunately, these are not available, because the cumulative 
distribution function F 2 is not given explicitly and we cannot find the pj corresponding 
to the observed data t/ 2 j as F 2 {y 2 j) = Pj- 

The approach we recommend to bypass this issue is to approximate the pj using the 
procedure described in Gilchrist (2000), which we summarize as follows: 

1. Denoting with y 2 {j) the ordered observed values for Y 2 , we look for the correspond¬ 
ing set of ordered such that y 2 {j) = Q 2 {p{j)), where Q 2 (p) is an estimate of 
Q 2 {p) based on the current values of the parameters (3 and ry (i.e. the values from 
the current MCMC draw). More specifically, we find the p^-) using the following 
procedure: suppose po is the current estimate of p for a given y value. Then, for 
a value of p close to po, Q 2 {p) can be approximated using the following Taylor 
series expansion 


Q2(p) = Q2(po)+ Q2(Po)(p-po) 
= Q2(po)+ g2(po)(p-Po), 
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which, solving for p, gives 


P = Pq + 


y-Q2{po) 

q2{po) 


( 12 ) 


where q 2 {po) is the quantile density function corresponding to Q 2 {po) and where 
we used the fact that y = Q 2 ip)- As a starting point for we use jf{n 2 + 1 ), 
j = 1,, n 2 . Equation (12) is used in an iterative fashion till the given value of 
Q 2 {p) differs from y by less than some chosen small amount (we use 10 “®). 


2. Once the values oipj are available, we compute the quantities f 2 {Q 2 {Pi)\(^,'n) us¬ 
ing equation (11). The critical issue in this step is the calculation of the derivative 
X'ip, A). In the cases we consider here (i.e. either a polynomial or a spline basis), 
the derivative is available in closed form and so no further numerical approxima¬ 
tion is needed. 


Strictly speaking, the calculation of /2 provided by the procedure we just described 
is not exact but involves a numerical approximation. We performed a detailed analysis 
on the goodness of this approximation and we found that the actual and approximated 
/2 values (and hence the overall likelihood) were indistinguishable. 


3.4 Special Cases 


We present now some special cases where an appropriate choice of the design matrix 
X{p,X) allows to recover an exact expression for / 2 (Q 2 (p)|/ 3 ,»?) belonging to a known 
distribution family. The proofs of these special cases are provided in Appendix 3. 

Case 1: Yi is Uniform and X{p, X = 0) = 1. In this case we assume that Yi\9i ~ U[Q, 0i\ 
and choose h{x) = x. Then Qi{p)/Q 2 {p) = Po and from (29) it follows that 

/2(Q2(p)|0i,/3o) = ^I[o.ei//3o]{<32(p)}, 

which corresponds to the density quantile function of a uniform random variable with 
parameter 62 = OiIPq, where denotes the indicator function taking value 1 if 

X G A and 0 otherwise. Note that it correctly depends both upon the parameter Pq and 
the Yl parameter 77 = . 

Case 2: Yi is Log-normal and X{p,X = 1) = [l,d>“^(p)]. We now assume ^ 

Cn{pi,al) and fix h{x) = log(a:). It follows that log{Qi(p)/Q 2 (p)} = Po + Pi ^~^{p), 
where 3>“^(p) is the quantile function of a standard normal random variable. Then by 
(31) we get 


f2{Q2{p)\pi,crl,PQ,Pi) 


1 

Q 2 {p)V^{cri - Pi) 


exp 


[logQ 2 (p) - (mi - ^ 0 )]^ 

2(ai - /3i)2 


which is the density quantile function of a Cn{p 2 , cr^) random variable with 7^2 = {pi — 
Po) and 0-2 = {<^1 — Pi)- Note that the density quantile function of y 2 correctly depends 
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both upon the parameters (3 = (/3o, /3i) and the Yi parameters r] = In this case 

the constraint (30) simply requires that /?i < (Ti, for every 0 < p < 1. 

Case 3: Yi is Pareto and X[p,\ = 1) = [l,log(l — p)]- Suppose yi|ai,5i ~ Va{ai,bi) 
and choose h{x) = log(a;). In this case iog{Qi{p)/Q 2 {p)} = /3o + /?! log(l — p) and by 
(31) we get 

/2(Q2(p)|ai,6i,/3o,/3i) = , 

aiPi +1 

which represents the density quantile function of a 7 ^ 0 ( 02 , 62 ) random variable with 
02 = aipl+i ^2 = bie~^°. The density quantile function of Y 2 correctly depends 
both upon the parameters (3 = (/3o, /3i) and the Yi parameters r) = (oi, &i). In this case 
the constraint (30) requires that /3i > — for every 0 < p < 1. 

3.5 Prior Structure and Posterior Calculation 

The parameters (3 and r] are assumed to be a priori independent, that is 

p(/3, r]\Cf3,Cv) = p(/3|C/3) X PirjlCr,) > 

where C /3 and Cry are the prior hyperparameters for (3 and r] respectively. For p(/ 3 |C/ 3 ) we 
use a a multivariate normal distribution with mean equal to the ordinary least squares 
(OLS) estimate of (3 based on the model 

h(^\ =X{p,,X)l3 + ei, * = !,...,n, (13) 

V2/2(r)/ 

where n = min(ni, 02 ), Pi = i /(n + 1 ) and variance-covariance matrix equal to 
where cr^ is normally fixed at a high value to induce a weakly informative prior distribu¬ 
tion for each (dj and /(a-i-i) indicates the identity matrix with size (A -I- 1). For the prior 
distribution for rj, the choice clearly depends upon the assumption made about Fi, but 
we suggest to use conjugate priors. For an example see the application in Section 5. 

The posterior distributions of /3 and rj are obtained by an MCMC simulation. In 
particular, we use an independent Metropolis-Hastings algorithm with blocking over 
(3 and rj separately (see Gilks et ah, 1996; Robert and Casella, 2004; O’Hagan and 
Forster, 2004; Carlin and Louis, 2009). As the proposal distribution for (3 we use a 
(A + l)-dimensional t distribution with mean and scale matrix chosen to match the f3 
OLS estimate and variance from (13), and a small number of degrees of freedom, usually 
set to 3. As with the prior, the proposal distribution for r] depends upon the particular 
application under investigation (see Section 5 for an example). 

3.6 GQTE Estimation and Inference 

Once the (3 parameters have been estimated and the convergence of the simulated chains 
has been assessed by conventional methods (see Carlin and Louis, 2009, or Gelman et ah. 
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2013), we can obtain the posterior distribution of the GQTE for any choice of g{-) as 
we now describe. 


'^(m) 

For each iteration m of the MCMC simulation, a value f3 for /3 is available. Using 
expressions (7) and (8) we can obtain Q^^\p) and Q^\p) as 


Q^^\p2^) = y2{i)h 


-1 


Xip2^,X)f3 


(m) 


i = 1, ... ,712, 


and 


= 2 / 1(0 



^(m) 

X{pu,\)l^ 



z = 1,... ,ni. 


where the yi(i) are the order statistics for sample t', while pti = if {ni + 1), i = 1 ,... ,ni, 
for £ € {1, 2}. It then follows that the m-th iteration value for Ag(p) is given by 


(p) = 9 (p)) - 9 ($2™^ (P)) > 0 < p < 1, (14) 

where Q^^\p) and Q^^\p) are found by interpolating the estimated quantile functions 
(p 2 i, Qi™^(p2i)) (piij Q 2 ^\pii)^ ■ The estimate of Ag(p) is finally obtained through 
the Rao-Blackwellized estimator 


= o<p<i, 

m—l 


(15) 


where M is the total number of iterations. Since the whole posterior distribution of 
^g{p) is available, standard inferential questions can be easily addressed in the usual 
ways. 

As detailed in Section 2, many interesting special cases arise from the general defi¬ 
nition of the GQTE. For example, if interest lies in estimating the QTE defined in (4), 
g{-) corresponds to the identity function and expression (14) becomes 

AW(p)=q(™)(p)-QM(p), ( 16 ) 

which can be evaluated for any value of p € (0,1). 

If the focus is on the ATE, defined in (5), then (14) returns the estimator 


1 


X (p2i, A) /3 


(m) 




A (ph, A)3 


(m) 


• (17) 


Appendix 4 contains the details for some other cases. As a final remark, note that 
an appealing feature of this approach is that the MGMG procedure needs to be run only 
once to compute the difference between any measure of the treatment effect of interest 
in the two groups. 
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3.7 Selecting the Number of Degrees of Freedom A 

The choice of the number of degrees of freedom A to use in the procedure above is not 
trivial. Many approaches can be proposed, but to keep the computational burden man¬ 
ageable, we propose to elicit it by minimizing an empirical version of the Li discrepancy 
measure (Devroye and Lugosi, 2001) 

n 

= E \f2iQ2{p^)\/3,v) - /2°(Q2(ft))| , (18) 

i^l 

where denotes the unknown true Y 2 density. More precisely, we select A using the 
following procedure: for each A € {1,..., Amax}, where Amax is the maximum admissible 
value for A, we estimate f 2 {Q 2 {Pi)\f^,v)^ where /3 is equal to the OLS estimate given 
in (13) and rj is estimated by using only the data {yii, ■ ■ ■ After replacing the 

true unknown density with a kernel density estimate of the data {y 2 i, ■ ■ ■, 2 / 2712)1 th® 
value of A is chosen as that minimizing the value of (18) over the set {1,..., Amax}. One 
drawback of this approach is that it tends to select high values of A. We provide further 
discussion about this issue in Section 6. 


4 Simulation Study 

In this section we report the results of a simulation study we performed to compare the 
finite sample properties of our method with those of other flexible approaches.The sim¬ 
ulation indicates that often the GQTE procedure has lower mean squared error and a 
similar bias as other flexible methods for comparing two distributions. In particular, we 
contrast our proposal with the smooth quantile ratio estimation (SQUARE) approach 
presented in Doininici et al. (2005), and the Probit stick-breaking process (PSBP) pro¬ 
posed in Chen and Dunson (2009). The former is a frequentist semiparametric method 
while the latter is a Bayesian nonparametric model. We now provide some details about 
these two methodologies and a justification for using them. 

In SQUARE it is assumed that the log quantile ratio is a smooth function of the 
percentile p with A degrees of freedom, that is 

The basic idea of smooth quantile ratio estimation is to replace the empirical quantiles 
with smoother versions obtained by smoothing the log-transformed ratio of the two 
quantile functions across percentiles. SQUARE has been proposed by Dominici et al. 
(2005) as an estimator of the mean difference between two populations with the ad¬ 
vantage of providing substantially lower mean squared error and bias than the sample 
mean difference or the maximum likelihood estimator for log-normal populations. To 
estimate 7 a R-fold cross-validation approach is suggested. Finite sample inference is 
performed by bootstrap but they also provide large sample results. 

The PSBP is a general nonparametric Bayesian model which has been proposed 
by Chen and Dunson (2009) for estimating the conditional distribution of a response 
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variable given multiple predictors. More specifically, the PSBP is a prior for an un¬ 
countable collection of random distributions. Like for the models belonging to the class 
of dependent Dirichlet processes (MacEachern, 1999), the PSBP main idea is to allow 
for dependence across a family of related distributions as a function of some covariates. 
More explicitly, the PSBP induce dependence in the weights of the stick-breaking repre¬ 
sentation (Sethuraman, 1994) by replacing the beta-distributed random variables with 
a probit model. This simple change greatly enhances the flexibility of the model, thus 
providing an extremely interesting extension within the framework of dependent priors 
across families of probabilities measures. 

We decided to compare the GQTE with these two methods because they are both 
highly flexible and have been proved to perform well under a broad set of situations. 
Originally the simulation also included the ANOVA dependent Dirichlet process mix¬ 
tures proposed by De lorio et al. (2004), but we decided not to report it here because 
it performed poorly as compared to the PSBP model. The reason for such inferior re¬ 
sults resides in the definition of the model itself, which assumes the weights in the 
stick-breaking representation of the process to be fixed, i.e. the same for the response 
distributions under the two treatment conditions. 

Since SQUARE produces an estimate only for the mean difference between two 
populations, our simulation is restricted to this specific case. We are aware that the 
results only provide a partial demonstration of the GQTE advantages over the other 
methods, but we also need to stress that the ATE is the most common measure used in 
practice for estimating the extent of a treatment effect. 

Our simulation framework is similar to that used in Dominici et al. (2005) and 
includes five scenarios, which are described in Table 1 under the labels A to E. In 
scenarios A, B and C the Y 2 distribution is assumed to be log-normal with parameters 
/i 2 = 7 and (72 = 1.5 which approximately correspond to the sample statistics for the 
medical expenditures of non-diseased subjects from the NMES data set. In scenario A, 
the Yi distribution is also log-normal but with larger values of the parameters, namely 
/ii = 7.5 and ui = 1.75. Scenarios B and C use a different assumption for the Yi 
distribution chosen to represent some reasonable shapes. The next two scenarios, D and 
E, compare the performances of the different methods using real data. In particular, 
in scenario D the data are randomly drawn from the distributions of nonzero Medicare 
expenditures for cases and controls from the NMES data set. Finally, scenario E assumes 
that both populations follow a gamma distribution with finite second moment. 

Under each scenario we compare the mean squared error (RMSE) and bias (RB) in 
percentage relative to the sample mean difference (yi — ^ 2 ) for the following methods: 
(1) the GQTE approach that assumes Yi to be log-normally distributed, (2) the GQTE 
assuming Yi follows a gamma distribution, (3) the SQUARE method using natural cubic 
splines with the number of degrees of freedom chosen by 10—fold cross-validation, (4) the 
PSBP model using the treatment indicator as the only predictor. The GQTE estimators 
use a natural cubic spline basis for the cubic-root transformed quantile-ratio smoother 
with the number of degrees of freedom A chosen following the procedure detailed in 
Section 3.7. The RMSE is computed by [{MSE(yi —^ 2 ) —MSE(A)}/MSE(yi —^ 2 )] x 100, 


534 


Generalized Quantile Treatment Effect 


Table 1: Simulation Study - Sampling mechanisms under each simulation scenario. In 
scenario D, Fg (g = 1,2) are the empirical cumulative distribution functions of the 
nonzero medical expenditures for patients in the case and control groups from the 
NMES data set, and, in scenarios B and C, g{u) = exp{7 + 1.5 • $“^(u)}. Moreover, in 
scenario B, sb ( u ) = I(o,i)(u) + II(o.9,i)('*^)i while in scenario C, sc{u) = 8it(l —u)I(op)(it). 


Scenario 

Population 1 

Population 2 

ni 

n2 

A 

CogAf {7.5,1.75) 

CogN{7,1.5) 

100 

1000 

B 

u ~ Unif{0,1 ), t/i = 

CogN{7,1.5) 

100 

1000 

C 

u ^ Unif{0,1 ), yi = g{u)e‘‘^^'^'> 

CogN{7,1.5) 

100 

1000 

D 

Fi 

% 

100 

1000 

E 

ga{2.5,2.5/yi) 

ga{2.5,2.5/y2) 

100 

1000 


while the RB is defined as [{E(A) —A}/A] x 100. Note that positive values for the RMSE 
imply a better performance for the estimator as compared to the sample mean difference. 

The results for 100 generated data sets for each scenario are reported in Table 2. 
We considered only the case of unbalanced samples with ni = 100 and n 2 = 1000 
because typically it represents a more critical situation to deal with in practice. These 
results show that the GQTE has a smaller mean squared error in most of the scenarios 
considered. 

Table 2: Simulation Study - Results from 100 replicate datasets. RMSE is the 
mean squared error relative to (j/i — 2 / 2 ) in percentage defined by [{MSE(j/i — ^ 2 ) ~ 
MSE(A)}/MSE(yi — ^ 2 )] X 100, and RB is the bias relative to {yi — ^ 2 ) in percentage 
defined by [{E(A) — A}/A] x 100, under the data generation mechanisms described 
in Table 1. The splines degrees of freedom A for the GQTE approach are chosen us¬ 
ing the heuristic algorithm described in Section 3.7 while for SQUARE we use 10-fold 
cross-validation. 



Scenario A 

Scenario B 

Scenario C 

Scenario D 

Scenario E 


RMSE 

RB 

RMSE RB 

RMSE 

RB 

RMSE 

RB 

RMSE 

RB 

GQTE (CogAf) 

39 

-29 

-26 -23 

3 

13 

-26 

18 

-1 

-2 

GQTE (Qamma) 

33 

-16 

1 -9 

48 

7 

25 

0 

0 

-2 

SQUARE 

35 

-6 

-7 -5 

1 

13 

25 

3 

0 

-1 

PSBP 

1 

-6 

6 -10 

-8 

9 

-13 

6 

-3 

-3 

MSE(yi -52) 

2952 


5992 

1051 


2100 


753 


A 

4982 


15225 

5244 


7144 


7144 



In scenario A, where both the populations are log-normal, the GQTE assuming Yi 
is log-normally distributed performs around 40% better than {yi — ^ 2 ), slightly better 
than SQUARE, even if somewhat biased. This result is superior to that of PSBP, which 
performs approximately as well as the sample mean difference. In scenario B, the PSBP 
provides the best result with a mean square error which is 6% smaller than (yi — 
^ 2 ), followed by the GQTE with gamma distributed Fi. In scenarios C, D and E the 
GQTE with gamma distributed Yi outperforms both the PSBP and SQUARE. More 















S. Venturini, F. Dominici and G. Parmigiani 


535 


specifically, in scenario C the GQTE provides a mean square error that is approximately 
50% smaller than (j/i — ^ 2 )- This is also the least biased result. In scenarios D and E, 
the GQTE approach provides comparable results as those provided by SQUARE. 


5 Application: Medical Costs for Smoking Attributable 
Diseases 

As an illustration, we apply the GQTE approach to the NMES data, where the distribu¬ 
tions of Yi (the cases) and Y 2 (the controls) are highly right-skewed. Eor this reason, we 
decide to use h{x) = log(a:). We show that having a smoking attributable disease induces 
both a location and scale shift in the medical expenditure distribution as compared to 
that for non-affected subjects, but with a thinning of the corresponding distribution’s 
tails. 

5.1 Data Description 

The data used in the following analysis is taken from the National Medical Expendi¬ 
ture Survey (NMES) and have been previously studied by other authors (for example 
Dominici et ah, 2005). It provides data on annual medical expenditures, disease status, 
age, race, socio-economic factors, and critical information on health risk behaviors such 
as smoking, for a representative sample of U.S. non-institutionalized adults (National 
Genter Eor Health Services Research, 1987). NMES data derive from the 1987 wave. In 
the data set used here a total of 9,416 individuals are available. Table 3 briefly sum¬ 
marizes the data set (numbers in parentheses represent the percentage of subjects with 
non-zero expenditures). 

Table 3: Disease cases and controls for smokers (current or former) and for non-smokers. 
Numbers within parentheses represent the percentage of people in that cell with non¬ 
zero expenditures. 



smokers 

non smokers 

Total 

cases 

165 (62%) 

23 (70%) 

188 (63%) 

controls 

4,682 (21%) 

4,546 (28%) 

9,228 (25%) 

Total 

4,847 (22%) 

4,569 (28%) 

9,416 (25%) 


We consider as cases (Yi) those individuals who are affected by smoking diseases, 
namely lung cancer and chronic obstructive pulmonary disease, while the controls (Y 2 ) 
are persons without a major smoking attributable disease. 

In the following analyses we consider only the non-zero costs paid for each hospital¬ 
ization by diseased and non-diseased subjects. 

Figures 1(a) and (c) show the histograms and boxplots for the medical costs of 
the cases and controls. Both the distributions are highly right-skewed, with the cases 
sample which is much smaller than the controls one (118 vs. 2,262). Table 4 contains 
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some high-order sample quantiles for the two groups which confirm the heavier tails of 
the cases costs distribution. However, note also that the controls sample has a higher 
maximum cost. 


Table 4: Summary of the NMES data set: high-order quantiles of non-zero medical 
expenditures for cases and controls. 


Quantile order 

75 

90 

95 

99 

99.9 

100 

Quantile for cases ($) 

11,525.17 

29,439.96 

49,595.77 

63,886.05 

213,567.69 

233,047.63 

Quantile for controls ($) 

2,600.00 

9,799.664 

30,625.206 

49,771.60 

135,896.07 

238,185.94 


Figures 1(b) and (d) show the histograms and boxplots for the cubic root transformed 
data. The need for such a transformation derives from the particular choice we make 
regarding the cases distribution (see next subsection) and is not a general requirement 
of our approach. Moreover, the use of this transformation does not alter in any way the 
results and the conclusions we draw, hence in the following we systematically refer to 
the transformed data. At any rate, note also that, even after the transformation, the 
outcome distributions still present heavy right tails. This conclusion motivates the use 
of h{x) = log(a:). 

In Figure 2(a) we report the Q-Q plot for the (cubic root transformed) NMES data. 
We can identify a non-linear smooth relationship between the cases and controls medical 
expenditures. Panel (b) of the same picture, which shows the quantile ratio as a function 
of the percentile p, confirms these findings. 

5.2 Model Assumptions and Tuning Parameters 

In this application we assume that Yi\tz,9 ~ QSM {tt,9\J), a particular mixture of 
gamma distributions with density 


7 



where the mixing occurs over the shape parameters and where J, the number of compo¬ 
nents, is fixed a priori. First introduced in Venturini et al. (2008), it has been explicitly 
developed as a model for right-skewed distributions and its parameterization allows to 
create a convenient and flexible method characterized by a single scale parameter for 
all the gamma components, plus the ordinary set of mixture weights. We use conjugate 
priors 9 ~ Ga{c(,S) and tt ^ Vj (j,..., j) for the shared scale parameter and the 
mixture weights respectively. The number of mixture components is fixed at J = 40, 
while the 9 hyperparameters are set to a = 845 and S = 1,300 (for more information 
on the elicitation of these priors see Venturini et ah, 2008, Section 2.3). 

The initial values of the Metropolis-Hastings algorithm are chosen as follows: the /3 
chain is started from its OLS estimate, as discussed in (13), while for rj = (0, tt) we 
first get a preliminary estimate (with 5,000 iterations) using the approach described 
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Untransformed 
(a) - Diseased (cases) 



Medical Expenditures (x 1,000) 



Medical Expenditures (x 1,000) 


Transformed 
(b) - Diseased (cases) 



Cubic root of Medical Expenditures 



Cubic root of Medical Expenditures 


(c) Non-discased (controls) 


(d) Non-discased (controls) 





Medical Expenditures (x 1,000) Cubic root of Medical Expenditures 


Figure 1: Histograms and boxplots of positive medical expenditures for hospitaliza¬ 
tions regarding smoking attributable diseases (lung cancer and coronary obstructive 
pulmonary disease) from the 1987 National Medicare Expenditure Survey (for clarity of 
exposition, the histogram of the original expenditures has been truncated at the top). 
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Cubic root Medical Expenditures for Controls 



Figure 2: (a) Q-Q plot of cubic root transformed non-zero medical expenditures, 
(b) Quantile ratio across percentiles with a fitted natural cubic spline. 


in Venturini et al. (2008), and then we fix their starting values to the corresponding 
estimated posterior averages. 

We run the MCMC algorithm for 1,000,000 iterations plus 200,000 iterations as 
burn-in. Such a large number of iterations is necessary because the model, being quite 
complicated, has shown a slow convergence behavior of the chains. 

5.3 Results 

The selection procedure described in Subsection 3.7 for the number of degrees of freedom 
suggested a value of A equal to 6, which can be considered fairly satisfactory from a 
visual inspection of the scatterplot (see Figure 2(b)). 

The acceptance rates for the MCMC posterior simulations are relatively small, be¬ 
ing around 0.5% for /3, 25% for 6 and 1.6% for tt. Despite that, we do not consider 
these results as problematic since the chain is moving in a high-dimensional space 
((A -I- 1) -I- (J -I- 1) = 48 dimensions), which necessarily slows down the convergence 
process. This is the main reason why we decide to run the simulation for a longer 
time. However, the results of the analysis presented below indicate that convergence 
was attained. We made other attempts with simpler (but less flexible) specifications 
of the Yi distribution, which showed a more conventional behavior of the acceptance 
rates. 

Figure 3 shows the fitted values for the estimated model (6). The gray dots rep¬ 
resent the quantile ratio for the transformed data as a function of the percentile p. 
The solid line illustrates the estimated posterior mean of the quantile ratio, while the 
dashed one represents its OLS estimate (the same line as in Figure 2(b)). The shaded 
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area gives the credible bands for the estimated posterior means, showing a fairly low 
amount of uncertainty around the estimates. Moreover, from the picture we can see that 
our model is less sensitive to extreme observations, especially in the right tails of the 
distributions. 



Figure 3: Fitted values of the estimated model (6). The solid line represents the esti¬ 
mated pointwise posterior means, while the shaded area corresponds to their pointwise 
95% credible intervals. The dashed line corresponds to the OLS fit for the same data, 
as described in (13). 


In Figure 4 we report the estimated Y 2 density. It is possible to ascertain a quite 
good fit. In the display, together with the f 2 {Q 2 {p 2 i)\&,T^, f^) posterior mean, we put 
the corresponding 95% credible bands and the histogram of the data. 

We now describe the results for the GQTE Ag{p) introduced in Section 3 for some 
choices of the function g{-). We start from the QTE, denoted as A(p) in (16), whose 
estimate is shown in Figure 5. The solid line represents the posterior mean of the 
medical costs QTE between cases and controls and the gray area is the corresponding 
95% credible interval, while the dashed line portrays the sample quantile differences. 
We can see that the distribution of the medical expenditures for subjects with smoking 
attributable diseases is always above that of those without smoking-related diseases. 
However, a much larger variability results in estimating the difference for the very 
extreme quantiles. This behavior is not too surprising since the two samples become 
very sparse as the medical expenditures become bigger (see the boxplots in Figure 1). 

Figures 6(a) and (b) contain the posterior distributions of the ATE and the standard 
deviation difference, as defined in Section 2. The sample mean and standard deviation 
differences are depicted in the two plots with a vertical dotted line, while the 95% 
credible intervals are indicated using dashed lines. The ATE estimated posterior mean 
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0 10 20 30 40 50 60 

Cubic root Medical Expenditures for Controls 


Figure 4: Estimated Y 2 density. The solid line represents the estimated pointwise poste¬ 
rior means, while the shaded area shows the corresponding 95% credible intervals. The 
thinner dark gray line depicts the data histogram. 



Figure 5: Estimated Quantile Treatment Effect (QTE), defined as A(p) = Qi{p) — Q 2 {p)- 
The solid line reports the estimated pointwise posterior means, the shaded area gives the 
corresponding 95% credible intervals, while the dashed line shows the sample quantile 
differences. Data are cubic-root transformed. 
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Figure 6: (a) Estimated posterior distribution of the Average Treatment Effect (ATE) 
between cases and controls medical expenditures (cubic root transformed), as defined 
in (5). The vertical dashed lines represent the 95% credible interval, while the dotted 
line is the sample mean difference, (b) Estimated posterior distribution of the standard 
deviation difference between cases and controls medical expenditures (cubic root trans¬ 
formed), as defined in Section 2. The vertical dashed lines represent the 95% credible 
interval, while the dotted line depicts the sample standard deviation difference. 


(on the log scale) is equal to 6.1127, while the estimated posterior mean of the standard 
deviation difference is 2.6275. These results prove that having a smoking attributable 
disease has a significant negative impact on both the location and scale of the single 
hospitalization medical cost distribution. After re-transforming the estimated quantiles 
on the original scale, we get an estimated posterior mean for the ATE between diseased 
and non-diseased subjects equal to $6,244.10. 

Finally, Figure 7 shows the impact of the treatment variable (i.e., having or not a 
smoking attributable disease) on the tailweight functions TW{p), defined in (22), of the 
two populations. The tails of the medical costs distribution for the diseased subjects tend 
to be heavier than those of the non-diseased ones for values of p up to approximately 
0.6, but the situation is inverted as we move to consider higher percentiles. Hence, while 
the fact of being affected by smoking attributable diseases tends to increase both the 
average and the variance of the medical expenditures distribution, we have found that 
the opposite occurs to the tail probabilities, that is, to the chances of incurring very 
high medical costs in a single hospitalization. 

As a last comment, we would like to remark on the explicit choice we made to exclude 
the observations with null medical costs. We took this decision because the inclusion 
of this further feature of the data requires the extension of our approach to a two-part 
modeling framework (Mullahy, 1998; Cameron and Trivedi, 2005), which doesn’t appear 
to be straightforward in our context. 
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Figure 7: Estimated tailweight difference Atw{p) = TWi{p) — TW 2 {p) between cases 
and controls, as defined in (22). The solid line represents the estimated pointwise pos¬ 
terior means, while the shaded area shows the corresponding 95% credible intervals. 

6 Discussion 

In this paper we have introduced a new parameter, the GQTE, for assessing the effect 
of a binary covariate on a response and a novel methodology to estimate it. The GQTE 
generalizes the most common approaches available in the literature, that is, the well- 
known average treatment effect (ATE) and the quantile treatment effect (QTE), since 
it allows to evaluate the effect of a treatment on any arbitrary characteristic of the 
outcome’s distributions under the two treatment conditions. 

To estimate the GQTE we have proposed a Bayesian procedure, where we assume 
that a monotone transformation of the quantile ratio is modeled as a smooth function of 
the percentiles. This assumption allows to increase efficiency by borrowing information 
across the two groups. The idea of quantile ratio smoothing has first been introduced 
by Dominici et al. (2005). In the present work we extended that proposal in several 
ways: 1) we let the link between the quantile ratio and the percentiles be general and 
application-specific, allowing to take into account the tail heaviness of the distributions 
involved in the analysis; 2) we derive a closed form expression for the model likelihood; 
3) our methodology is not limited to the mean difference between the treated and the 
controls, but provides a comprehensive assessment of the treatment effect; 4) finally, 
we embed the whole estimation process within a Bayesian framework allowing to make 
inference on the GQTE Ag{p) for any choice of the function g{-), and for both symmetric 
and highly skewed outcomes. 

The GQTE is a marginal measure in the sense that it provides an estimate of the 
treatment effect over an entire population. In the econometrics literature this kind of 
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approach is usually termed the unconditional QTE (Firpo, 2007; Frolich and Melly, 
2008) in contrast with the conditional QTF, where the treatment effect is determined 
separately for different combinations of a set of covariates (Koenker and Bassett, 1978; 
Koenker, 2005; Angrist and Pischke, 2009). The inclusion of covariates can improve the 
efficiency of an estimator even when the primary goal of the analysis is a marginal effect. 
Accordingly, methods have been proposed to extract marginal quantiles from estimates 
of conditional quantiles (Machado and Mata, 2005; Frolich and Melly, 2008). A challenge 
in extending our approach along these lines is the lack of an “iterated expectation” 
result^ for the quantiles (see for example Angrist and Pischke, 2009, Chapter 7). 

To further clarify our goals, we want to stress that in this paper no particular 
emphasis has been placed on the causality issues that naturally comes into play when 
the objective is the estimation of a treatment effect (see for example Rosenbaum, 2002, 
2010; Rubin, 2006; Angrist and Pischke, 2009). More precisely, our intent here is solely 
to provide a general measure of the effect of a binary treatment on a response variable, 
together with a flexible approach to estimate it. 

We compared the performance of our estimation approach with other highly flexible 
methods in a simulation study for the mean difference between two populations. Our 
study revealed that the GQTF performs generally better than the other competing 
estimators at least in estimating the mean difference. 

We have applied our methodology to the NMFS data set to assess the effect of being 
affected by smoking attributable diseases on the single hospitalization medical costs 
distribution. We have found that having these diseases increases the average medical 
bill amount as well as its variability in the population, while it reduces the probability 
of incurring higher bills. 

Our approach can be extended in various directions. The most promising research 
question we can see involves taking into account individual level characteristics in mea¬ 
suring the effect of a treatment. In our context, this would involve the estimation of a 
conditional version of Ag(p), something like Ag{p\x) = g [Qi{p\x)) — g {Q 2 {p\x)). The 
clear advantage of including covariates would be an increase in the efficiency of the 
estimates (Frolich and Melly, 2008). To control for systematic differences in covariates 
between two populations, a common strategy is to group units into subclasses based on 
covariate values, for example using propensity score matching, and then to apply our 
method within strata of propensity scores (Rosenbaum, 2002, 2010), as implemented 
for example in Dominici and Zeger (2005). 

Currently we are considering only a binary treatment effect, so another important 
line of research is the extension of the methods to categorical ordinal and to continuous 
treatments. 

A further direction for future research concerns the choice of the number of degrees 
of freedom A. In this paper we adopted the simple approach of choosing A by minimizing 
an empirical version of the Li distance between the Y 2 density estimate and its kernel 
density estimate (see Subsection 3.7). More structured solutions can obviously be con- 


^While for a standard linear model, in fact, the assumption E(Yi\Xi) = X^j3 does imply E(Yi) = 
E{Xi)'/3, the same conclusion doesn’t hold for the conditional quantiles. 
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sidered. A natural extension would allow A to be a random quantity to be estimated 
together with all the other parameters using a trans-dimensional MCMC approach, like 
for example the reversible jump algorithm (Green, 1995). While this solution would 
allow to take into account also the uncertainty connected to the a priori ignorance 
about the A value, the consequence would be a dramatic increase in the computational 
workload of the estimation algorithm. 
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Appendix 1: Additional GQTE Examples 

Together with the cases presented in Section 2, many other less conventional measures 
of the difference between two distributions can be obtained by properly choosing the g{-) 
function in the GQTE definition. For example, by choosing g{x) = J dp we obtain 
the difference between the population r-th moments 


= [ Qi{pY dp- f Q2 {pY dp. 
Jo Jo 


(19) 


Using the fact that for a random variable Y with expected value /i, variance and 
quantile function Q{p) it holds that (see Gilchrist, 2000 or Shorack, 2000) 

[ [Qip)-iA^dp= [ Q{pYdp-fY, 

Jo Jo 

by suitably choosing the g{-) function, we recover the difference between the two pop¬ 
ulation variances as 


A„2 = 


Qiipf dp - Qi{p)dp^ - j Q2(pfdp-(^J Q2(p)dp 

J Qiipfdp-J^ Q2{pYdp - Qi{p)dp^ ~ (/ 


— a^2 YY) 


( 20 ) 
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However, the cases encompassed by the GQTE include many other quantile-based 
indexes that are less frequently used in the literature, like the inter-p-range ipr{p) = 
Q{1 — p) — Q{p), or the skewness-ratio sr{p) = [(5(1 — p) — (5(0.5)]/[(5(0.5) — Q{p)], 
0 < p < 1, which provide robust measures of the scale and shape of a distribution (for a 
list of these indexes see Gilchrist, 2000; Shorack, 2000; Parzen, 2004; Wang and Serfling, 
2005; Brys et ah, 2006). A quantity of particular interest to economists is the difference 
between inter-decile ratios, defined as 

Qi(0.9) Q2(0.9) 

gi(O.l) Q2(0.1)’ 

which is commonly used to measure the inequality in a population (see Frolich and 
Melly, 2008). The previous quantity can be easily generalized as follows 


^ir{p) 


Qi{^-P) _ Q2(1 -p) 

Qi{p) Q2{p) 


( 21 ) 


for any 0 < p < 0.5. Notice that all these indexes are obtainable from the general 
definition (3) by properly choosing the function g{-). 


As a last example, we consider a further GQTE special case that is based on the so 
called tailweight function defined as 

TW(p) == ^logQ(p), 0<p<l, 

Q[P) dp 

which is used to quantify the probability allocated in the tails of a distribution. One can 
compute the difference between the tailweight functions for two populations by choosing 
the logarithmic derivative of the quantile function as the g{-) functional in (3), that is 


^Tw{p) 


TWiip) - TW2 {p) 

^logO,(p)-ki„g<3,(p) 


dp 



( 22 ) 


If Atw{p) > 0, we can conclude that the treatment is causing a thickening of the Yi 
distribution tails as compared to those of y 2 if A^wip) ^ 0- Finally, note that, thanks 
to the equivariance property of the quantiles, (22) can be written also as 

Atw (p) = ^ log Qi (p) - ^ log Q 2 ip) 

dp dp 

= ^ [gi,iog(p) ~ g2,iog(p)] 

= ^^iog(p), (23) 

where Q^jog, ^ = 1,2, indicates the quantile of the log-transformed data and Aiog(p) 
denotes the parameter (4) calculated on the quantiles of the log-transformed data. 
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Appendix 2: Proof of Theorem 1 and Corollaries 

Proof of Theorem 1. Differentiate (7) with respect to p to get 

gi(p) = 92 (p) h-^ [X(p, A) f3] + X'ip, X) f3 Q^ip) [ ^ \) 

where qi{p) = dQi{p)/dp denotes the so called quantile density function for the pop¬ 
ulation i = {1,2}, while X'{p,X) corresponds to the derivative of X(p, A), 0 < p < 1 
(properly resized because a constant is normally included in the design matrix X{p, A)). 

Apply now to both qi{p) and 92 (p) the following relationship between the quantile 
density and density quantile functions (see for example Gilchrist, 2000; Parzen, 1979, 
2004) 

f{Q{p))q{p) = (25) 

to get the expression 


h-^[X{p,X)f3] 


/i(Qi(p)|i7) f2iQ2ip)\fd,v) 

+ X'{p,X)f3Q2{p) 


d{X{p,X)f3) 


h-^[X{p,X)f3]\, (26) 


and hence 


f (n n) — _ /i(Qi(p)l^)^ b^(pA)/ 3 ] _ /97\ 

I2W2(P)\P,V) l-/i(Qi(p)|,,) X'(p,A) /3Q2(p){ a(xip,x)n) h-HX{p,\) /3]} ' ’ 

Finally, substituting (7) in place of Qi(p) proves the main statement. 

Moreover, /2(Q2(p)|/3, ^7) is a proper density function because: 


• f2{Q2{p)\fd,'n) > 0, for any 0 <p < 1; since/i iQi{p)\'n) > 0 and h ^ [X{p,X)f3] > 
0 because, as assumed in (6), it is the ratio of two positive quantile functions, this 
fact can be proved by showing that the denominator of (27) is nonnegative which 
is ensured by the constraint (10). 

• Jo f 2 {Q 2 {p)\fd,r])q 2 {p)dp = 1, which is true because /2(Q2(p)|/3,r 7 ) 92 (p) = 1 by 

construction. □ 


A couple of immediate consequences of Theorem 1 regard two cases that occur 
frequently in practice. We provide the details about these situations in the next two 
corollaries. 

Corollary 1. Let the same assumptions of Theorem 1 hold. Suppose additionally that 
h{x) = X. If for every 0 < p < 1 the vector (3 satisfies the constraint 

A'(p,A)/3 ^ 1 

A(p,A)/3 - /i(Qi(p))Qi(p) ’ 


( 28 ) 
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then the density quantile function /2(Q2(7')|/3,'^) for Y 2 is 

f (n t MR ^ - /i {Q2{p)X{p,X)f3\r])X{p,X)f3 

/2(g2(p)|/3,i7) i_f^^Q^^p)x{p,X)f3\v)X'{p,X)f3Q2{p)' ^ ^ 

Corollary 2. Let the same assumptions of Theorem 1 hold. Suppose additionally that 
h{x) = log(a:). If for every 0 < p < 1 the vector (3 satisfies the constraint 


X\p,X)(3< 


1 


fi (Qi(p)) Qi(p) ’ 

then the density quantile function /2(Q2(p)|/3, for Y 2 is given by 

fi (Q2(p)e^(P’^)^|r7) 


f2{Q2ipW,V) = 


g-x(p.A)/3 _ (Q2(7') X'(p, X)(3Q2ip) 


(30) 


(31) 


Note that in these two situations, the general constraint (10) reduces to a linear 
constraint on /3. 


Appendix 3: Proofs of the Special Cases 

Case 1: Yi is Uniform and X{p,X = 0) = 1. Here Yi|0i ^ W[0, 0i] and h{x) = x. In 
this case Qiip)/Q 2 (p) = Po- Hence the density, distribution and quantile functions of 
Yi are respectively 


h{yi\0i) = 

Fi{yi\ei) = 


From (29) it follows that 


1 

m 

01 


II[o,ei]{ 2 /i} 


Qi{p\0i) = 0iP, 


0 < p < 1. 


/2(Q2(p)|^1,/3o) = ■^I[0.6(i]{(52(p)/3o}/3o 

ai 

= ^I[0.ei//3o]{<32(p)} , 

which is the density quantile function of a W[O, 02 ] random variable with 02 = 0 i/Po- 

Case 2: Yi is Log-normal and X(p, A = 1) = [1, <I>“^(p)]. Assume Yil/ii, erf ~ Cn{pLi,a‘{) 
and h{x) = log(x). In this case \og{Qi{p) / Q 2 {p)} = Po + Pi 4*~^(p), where $“^(p) is 
the quantile function of a standard normal random variable. The density, distribution 
and quantile functions of Yi are given by 


I 


exp < -- 


(logyi - piY 


2a\ 


/i(2/i|Mi,cr?) 


yiy/Tncfi 
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^l(2/l|Ml,cr?) 

Then by (31) it follows 

f2{Q2{p)\m,crl,l3o,Pi) = 


= 


log yi - fii 


CTl 


= exp{ +cri$ i(p)} , 


0 < p < 1. 





\/2^cri 

exp{iii+o-i« '^(p)} 

exp{-/3o - A'I’-Hp)} 

1 1 n-—( 

V^TTCTl exp| ^(p)| 


I [■*■ bp)]^ \ 

^exp|-i-5-^1 


exp {pi +cri4>-i(p)} 


exp^-M 


V^{cri - Pi) exp { {m -/3o) + (o-i -^i)4>-i(p)} 
1 


, [((il-/3o) + (CTi-/3i)« ^(p)-(mi-/3o)]^ 


V^{ai - pi) exp {(pi -/3o) + (cri -/3i)4>-i(p)} 


Q2{p)'J^{ai — Pi) 


exp< - 


[log Q2(p) - (pi - ^o)]^ 
2(cri — /3i)2 


which is the density quantile function of a /ln(p 2 , cr|) random variable with p 2 = (pi — 
/3o) and cr 2 = (ci - /3i). 

Case 3: Yi is Pareto and X{p,X = 1) = [l,log(l — p)]- Now Fi|ai,6i ~ Va{ai,bi) 
and h{x) = log(x). In this case log{Qi{p)/Q 2 {p)} = Po +/3i log(l — p). The density, 
distribution and quantile functions of Yi are given by 


/i(yi|ai,6i) 

Fi{yi\ai,bi) 

Qi{p\ai,bi) 


1-6^2/1”“^ 

_^ 

6i(l — p) “1 , 0 < p < 1. 


Then (31) implies 


ai6{l 

_ 1 _' 

-(<*1+1) 

exp{-PQ-Pi log(l-p)}. 

l + aP^ 

_ l^' 

-(ai+1) 

T^<>l(l-p) “1 1 


ai 


aiPl+l 


— (bie (1 -P) “ 


+ 1 


aiPi +: 


/2(Q2(p)|ai,fei,/3o,di) 
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which is the density quantile function of a Pa{a 2 , ^ 2 ) random variable with 02 = 
and 62 = 


Appendix 4: Details About the Estimation of Other Cases 

One can estimate the impact of a binary treatment on the r-th moments, denoted as 
A^r in (19), by computing 




^ (P2i,A)3 


(m) 


-I rii 

--E 






(32) 


The last expression allows to estimate the treatment effect on the population variances, 
defined in ( 20 ), which is given by 


= A^T^ - 


n2 ^ , 
^ 1—1 


E 2/2(0^ ' 


X{P2^.\)^3 


(m) 


ni ^ , 

^ z—1 




X{pu,X)f3 


(m) 


(33) 


As a concluding example, the effect of a binary treatment on the tailweight functions 
of two distributions, introduced in ( 22 ), can be obtained by first computing the posterior 
draws 


A, 


(™) („\ _ A. 

TwlP) - 


log h 


,-i 


X{p,X)f3 


(m) 


(34) 


and then by applying (15). When h{x) = log(x), (34) becomes 


i(™) 

^TW 


{p) = X'{p,X)P 


(m) 


(35) 


and the estimate of Atw{p) is 

1 “ -^(m) 

Atw{p) = A'(p, A)3 

m—1 

\ m=l / 

= X'{p,X)^, 


with f3 


M Z^m=l ^ ’ 


the posterior mean estimate of (3. 


( 36 ) 
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